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Detecting  Dental  Epidemics 


—  Dental  diseases  are  typically  regarded  as  if  they  were  non- 
communicable.  For  this  reason,  training  in  methods  appropriate 
to  the  analysis  of  classical  epidemics  is  not  common  during 
dental  education.  Recent  emphasis  on  the  control  of  cross¬ 
contamination  potentials  would  seem  to  enhance  the  importance  of 
epidemiological  monitoring  of  disease  outbreaks.  Monitoring 
allows  for  the  identification  of  unusual  clustering  and  perhaps 
eventually  to  the  determination  of  causative  agents . 

Disease  related  events  (henceforth  simply  called  events)  may 
cluster  in  either  space  or  time.  The  critical  statistical  issue 
is  whether  clustering  of  particular  magnitudes  is  unusual  or 
could  have  resulted  from  chance  fluctuations .  Clustering  in 
space  (e.g.  differences  in  event  rates  between  geographic  areas 
or  practitioners)  can  be  detected  using  a  variety  of  well 
docximented  statistical  techniques  (1,2).  In  general,  statistical 
methods  are  used  to  determine  whether  or  not  there  are 
significant  difference?  among  proportions  or  counts. 

These  methods,  as  well  as  others  including  the  cluster  index 
(3)  and  cvunulative  sum  (CUSUM)  (4),  can  also  be  used  to  evaluate 
the  consistency  of  event  rates  between  time  intervals.  If 
ungrouped  (by  time  period)  data  are  available,  these  techniques 
are  not  optimal  because  they  discard  information.  An  epidemic 
may  not  be  detected  simply  because  its  occurrence  overlaps  the 
end  of  one  arbitrary  interval  and  the  beginning  of  the  next. 

^  The  purpose  of  this  paper  is  to  describe  efficient  methods 
for  the  statistical  analysis  of  continuous  time  distributions  of 
dental  events.  A  single  data  example  is  chosen  but  the  methods 
could  apply  in  a  variety  of  situations.  The  following  data  could 
conceivably  represent  post-surgical  complications  following 
removal  of  third  molars,  periapical  infections  following  a  first 
phase  of  endodontic  treatment,  and  sc  forth.  ^  '  .sf'  - 

The  Data  Set 

Figure  1  describes  the  incidence  of  diagnosed  pericoronitis 
(POOR)  among  personnel  of  a  U.S.  aircraft  carrier  during  the 
final  140  days  of  a  six-month  Mediterranean  cruise  during  1987. 
Data  collection  began  immediately  upon  arrival  of  the  second 
author  to  the  carrier's  dental  department.  The  great  majority  of 
personnel  were  on  board  for  the  entire  cruise  but  the  ships 
compliment  did  vary  between  approximately  4,600  and  4,700.  Thera 
were  12  reported  cases  of  FCOR,  but  these  fell  into  two  5-case 
clusters  of  eight  and  seven  days  duration . 

If  events  distribute  themselves  within  a  time  interval 
according  to  a  random  process,  they  should  be  uniformly 
distributed  across  the  observational  period.  Although  the  two 
clusters  beginning  on  days  41  and  130  seem  unusual  and  non- 
random,  specification  of  a  probability  statement  is  required  for 


inferential  purposes . 


Several  statistical  techniques  will  be  applied  to  this  data. 
In  actual  practice,  the  use  of  multiple  tests  and  the  selection 
of  tests  after  the  data  have  been  examined,  reduces  statistical 
validity  (which  may  not  be  a  serious  problem  when  intentions  are 
exploratory) .  The  null  hypothesis  will  always  be  that  the  PCOR 
events  were  distributed  uniformly  during  the  140  days  of 
observation  while  the  alternate  hypothesis  will  be  that  the 
sample  could  not  have  been  selected  from  a  uniform  distribution 
of  events  (at  p<.05). 

Kolmogorov- Smi rnov 

The  most  well-know  test  that  can  be  applied  to  this  data  is 
the  Kolmogorov-Smirnov  goodness -of- fit  test  (5,6).  The  basic 
idea  behind  this  test  is  to  Identify  the  maximum,  absolute  value, 
deviation  between  the  sample  ciimulative  distribution,  (x) ,  and 
the  theoretical  cumulative  distribution,  F^(x).  The  test 
statistic  D  is  defined  as: 

D=max|Fg(x)-F^(x)  I  . 

This  value  can  be  determined  graphically  by  inspecting  the 
cumulative  distributions  or  computed  algebraically.  If  computed, 
it  is  important  to  recognize  that  the  largest  vertical  difference 
between  Fg (x)  and  F^ (x)  may  not  occur  at  an  observed  value  of  x 
(6) .  An  algebraically  valid  value  of  D  is: 

D=max(max[|Fg(x^)-F^(Xi)  |,  |Fg  (x^.j^) -F^.  (x^)  |]). 

If  this  value  exceeds  the  tabulated  value,  the  hypothesis  that 
the  sample  came  from  the  theoretical  distribution  is  rejected. 

In  this  case,  the  theoretical  distribution  is  the  uniform  so 
each  day  should  produce  l/140th  of  the  total  number  of  cases. 
Sample  and  theoretical  cximulative  distributions,  and  differences 
between  them  as  each  case  is  identified  av  e  shown  shown  in  Table 
1  and  Figure  1.  D=0.3452  and  ps.09  (by  ^ar  interpolation  of 

tabled  values)  so  the  hypothesis  of  a  unix. rm  distribution  is  not 
rejected. 

Kuiper 

Kuiper' s  goodness-of-fit  test  (7)  has  been  shown  to  be  more 
sensitive  to  departures  from  randomness  except  when  they  involve 
marked  crowding  of  points  on  only  a  single  end  of  the  time  line 
(8) .  Since  this  is  not  true  of  our  data,  this  test  might  be  a 
more  powerful  alternative.  The  data  in  Table  1  may  also  be  used 
to  calculate  Kuiper' s  test  statistic  K  which  is  defined  as: 

K=/n(D'''+D“) , 

where  n  «  the  number  of  cases,  D'*'  is  the  largest  positive  value 
of  Fg (x) -F^ (x)  and  D'  is  the  absolute  value  of  the  largest 


2 


nttgative  value,  again  with  proper  consideration  for  the  location 
of  the  largest  vertical  difference.  For  the  POOR  incidence  data, 
X«3. 4641 (0.1571+0. 3452) »1. 7400  and  ps.03  (by  linear  interpolation 
of  tabled  values)  and  the  null  hypothesis  is  rejected. 

Watson 

Instead  of  the  two  maximum  deviations  used  in  Kuiper' s  test, 
Watson's  goodness-of-fit  test  (7)  uses  a  mean  square  deviation. 

It  appears  that  this  test  is  especially  powerful  for  small  sample 
sizes  and  is  suitable  for  both  unimodal  and  multimodal  data.  The 
two  separated  clusters  in  the  PCOR  data  suggests  possible 
application  in  this  case.  Watson's  statistic  is  defined  as: 

D2=SDM(F(x)2)-SUM(cF(x)/n)+n(l/3-(MEAN(P(x)  )-0.5)^)  , 

where  cs2i-l  and  i  is  the  event  number  from  1  to  n.  In  this 
case,  U^=5. 3878-9. 1167+12 (1/3- (7. 1/12-0. 5)^) =0.1703  and  p=.07  (by 
linear  interpolation  of  tabled  values) .  Although  not 
statistically  significant  (at  p<.05),  the  results  of  the  Watson 
test  might  might  lead  the  researcher  to  consider  the  alternate 
hypothesis  worthy  of  continued  investigation. 

Scan 

If  there  is  any  information  on  the  conjectured  duration  of 
an  epidemic,  the  scan  statistic  (10)  may  be  a  more  powerful 
alternative  to  the  methods  previously  described.  "The  scan 
statistic  is  the  maximum  ntimber  of  observed  cases  in  an  interval 
of  preselected  length,  as  the  interval  is  allowed  to  scan,  or 
slide  along,  the  time  frame  of  interest."  Suppose  that  the 
investigator,  because  of  anecdotal  reports  (this  is  fictitious) , 
believed  that  PCOR  incidence  would  be  greatest  for  seven  day 
periods  following  liberties  in  foreign  ports.  The  scan  interval 
is  set  to  seven  days  and  the  value  of  the  scan  statistic  would  be 
five  (cases) .  The  significance  level  for  this  statistic, 
P(n,N,r),  for  a  particular  n  (maximum  cases  in  the  scan 
interval) ,  N  (total  number  of  cases  identified) ,  and  r  (ratio  of 
scan  interval  to  total  period  of  observation)  can  be  determined 
from  tabled  values  (10)  or  approximated  by  P  (11) , 

P (n, N, r) =P= (n/r-N+1) Pr (y=n) +2Pr (y>n+l) , 
where  Y  has  a  binomial  distribution  with  parameters  N  and  p=r; 

Pr(Y=n)=N!r^(l-r)**““/ ( (N-n)  !n!)  . 

A  short  algorithm  (in  IBM  PC  BASIC)  to  compute  P  for  small  N  is 
found  in  the  appendix. 

For  our  data  n=5,  N=12,  and  r=7/140=l/20  and  p=.015.  Thus 
the  nu''>  hypothesis  would  be  rejected.  This  technique,  however 
is  dependent  upon  selection  of  an  appropriate  interval  which 
requires  prior  knowledge  about  durations  of  hypothesized  point 
epidemics  If  the  interval  had  been  selected  as  10  days  then 
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p«.052,  and  if  14  days  than  ps.149.  Thus,  statistical 
significance  in  this  case  would  have  resulted  from  a  fortuitous 
interval  selection.  In  general,  the  sort  of  information 
necessary  for  an  informed  selection  of  interval  length  would  not 
usually  be  available  in  exploratory  studies  and  selection  after 
the  data  are  seen  would  lead  to  invalid  alpha  error  estimates. 

It  should  also  be  noted  that  the  same  a  priori  information 
that  would  allow  a  good  selection  of  interval  length  might  also 
be  used  to  categorized  time  periods  as  an  independent  variable. 
For  this  data,  counts  while  underway  for  at  least  seven  days 
versus  counts  within  seven  days  after  port  liberty  could  be  used. 
As  such,  other  more  traditional  techniques  may  be  applied.  In 
this  case,  a  statistical  test  for  differences  between  two  Poisson 
counts  (2),  adjusted  for  population-time,  would  be  a  good 
alternative . 

Application  in  the  Dental  Setting 

Detection  of  unusual  disease  event  clusters  has  inherent 
clinical  value  due  to  current  emphasis  on  quality  control  and 
risk  management.  The  procedures  that  have  been  described  are 
relatively  simple  methods  that  will  allow  an  investigator  to 
apply  a  probabilistic  criterion  to  clustering  during  a  time 
period. 

Kolmogorov- Smirnov,  Kuiper,  Watson,  and  Scan  statistics 
differ  in  power  and  necessary  a  priori  information  that  would 
lead  to  their  selection.  In  the  absence  of  prior  knowledge  about 
the  particular  type  of  epidemic  being  investigated,  Kuiper' s  test 
appears  to  be  the  best  choice. 

These  methods  are  appropriate  to  a  single  retrospective 
analysis  of  a  data  set.  Should  an  epidemic  potential  be 
identified  for  particular  infections  or  complications,  it  is 
appropriate  that  implementation  of  continuous  monitoring  be 
considered.  Such  surveillance  methods  may,  for  example,  involve 
the  comparison  of  current  rates  to  historical  risk  or  CUSUM 
continuous  monitoring  (11)  .  These  ongoing  methods  will  enhance 
the  possibility  of  identifying  as  yet  unknown  causative  agents. 
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DAm  TABIZ  FCR  OCM>UnNS 

K3UCX9GBCV-S1IRM0V  D,  KDIPER  K,  AND  WATSON  IT  SIATISTICS 


Days  of  Cbs^rvation  =  140 
Nbnbor  of  Casas  Idantifled  s  n  »  12 


Case  Day 

/4  \ 

F3(x) 

Ft(x) 

F3(x)-Ft(x)  F3 

c 

Ft(x)2 

CF^(x)/n 

1 

28 

1/12 

28/140 

-0.1167 

-0.2000 

1 

0.0400 

0.0167 

2 

41 

2/12 

41/140 

-0.1262 

-0.2095 

3 

0.0858 

0.0732 

3 

44 

3/12 

44/140 

-0.0643 

-0.1476 

5 

0.0988 

0.1310 

4 

45 

4/12 

45/140 

0.0119 

-0.0714 

7 

0.1033 

0.1875 

5 

46 

5/12 

46/140 

0.0881 

0.0048 

9 

0.1079 

0.2464 

6 

48 

6/12 

48/140 

0.1571 (+) 

0.0738 

11 

0.1176 

0.3143 

7 

74 

7/12 

74/140 

0.0548 

-0.0286 

13 

0.2794 

0.5726 

8 

130 

8/12 

130/140 

-0.2619 

-0.3452 (-) 

15 

0.8622 

1.1607 

9 

131 

9/12 

131/140 

-0.1857 

-0.2690 

17 

0.8756 

1.3256 

10 

135 

10/12 

135/140 

-0.1310 

-0.2143 

19 

0.9298 

1.5268 

11 

136 

11/12 

136/140 

-0.0548 

-0.1381 

21 

0.9437 

1.7000 

12 

136 

12/12 

136/140 

0.0286 

-0.0548 

23 

0.9437 

1.8619 

SOM 

7.1000 

5.3878 

9.1167 

Qilass  otharwise  noted,  the  subscript  assigned  to  x  in  "(x)"  is  i.  ('f)  is  the 
largest  positive  diffexenoa  azxl  (-)  the  largest  negative  difference  between 
the  sanple  and  theoretical  distributions. 
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APPENDIX 


Program  for  computing  P  for  the  scan  statistic  whan  N  is  small 


10  'ENTER  APPROPRIATE  VALUES  IN  NEXT  THREE  LINES' 

20  Nsl2  'N»TOTAL  NUMBER  OF  CASES  FOUND 

30  R»l/20  'RsRATIO  OF  INTERVAL  SIZE  TO  TOTAL  PERIOD 

40  S«5  'SsnsMAXIMUM  NUMBER  OF  CASES  FOUND  IN  MOVING  INTERVAL 

50  P»0 

60  FOR  JsO  TO  S 
70  NDM=1:DEN=1 

80  FOR  I»(N-J+1)  TO  N:NUMsNUM*I:NXXT  I 
90  NUM=NUM*(R)'^J*(1-R)^(N-J) 

100  FOR  1=1  TO  J:DEN=DEN*I:NEXT  I 

110  PR=NDM/DEN:P=P+PR 

120  PRINT  USING  "#.######  ";PR,1-P 

130  NEXT  J 

140  PRINT 

150  PRINT  "PROBABILITY (S,N,R)  =  " 

160  PRINT  USING  "#.######"; (S/R-N+1) *PR+2* (1-P) 

170  END 
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SEQUENTIRL  DRYS  OF  POOR  INCIDENCE  RECORDING 


Figure  1.  Theoretical,  Ft(x),  and  Sample,  Fglx),  cumulative 
distribution  functions  for  the  PCOR  data.  The  theoretical 
distribution  is  the  uniform  and  D"*”  and  D~  are  maximum  positive 
and  negative  differences  between  the  empirical  data  and  this 
distribution.  The  lines  on  the  x-axis  indicate  the  incidence 
of  events  which  are  added  to  the  cumulative  sample  distribution. 
Both  the  Kolmogorov-Smirnov  and  Kuiper  test  statistics  can  be 
estimated  graphically  or  algebraically. 
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